🎮 Reinforcement Learning - chris1 · Scour

Sampling-Based Safe Reinforcement Learning ♟️Game Theory

GRIP-VLM: RL for Efficient Vision-Language Models 💬LLMs

startuphub.ai·6d

Scaling Reinforcement Learning at Applied Compute 🤖AI Agents

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-⁠Play 🤖AI Agents

vmax.ai·5h·Hacker News

https://odyssey.ml/research 🤖AI Agents

Learning Systems and Innate Behavior 🤖AI Agents

costa-and-associates.com·10h·Hacker News

Reinforcement Learning: An Introduction (2nd Edition) 📐ML Theory

chizkidd.github.io·5d

How Auto Transport Companies Are Leveraging AI for Precision Logistics 🤖Machine Learning

haulin.ai·22h·DEV

The Safety Paradox: How RLHF Creates the AI Psychosis Problem It’s Meant to Prevent 📡Information Theory

promptinjection.net·2d·Hacker News

Long Context Pre-Training w/ Lighthouse Attention 💬LLMs

mail.bycloud.ai·1d

Cursor bets on cheaper coding with Composer 2.5 and Kimi K2.5 🛠️Developer Tools

thenewstack.io·16h

inclusionAI/Ring-2.6-1T 🕸️WebAssembly

huggingface.co·6d·Hacker News, r/LocalLLaMA

Training SID-1 to beat GPT-5 at search with 1k+ QPS RL 🔍RAG

turbopuffer.com·1d·Hacker News

奥赛金牌打包成两步配方 🤖Machine Learning

ai-brief.liziran.com·3d

A lock proves the security of the room and not that the room is empty 🤖AI Agents

github.com·2d·Hacker News

Cursor launches Composer 2.5 model for long-running AI coding tasks at cheaper token cost 🏢Software Industry

indianexpress.com·1d

[MIT] RLCR: Teaching AI models to say "I'm not sure" 🤖Machine Learning

csail.mit.edu·6d·r/LocalLLaMA

Massachusetts' Institute of Technology Introduction to Deep Learning 🧠Neural Networks

i-programmer.info·1d

Beyond Action Residuals: Real-World Robot Policy Steering via Bottleneck Latent Reinforcement Learning 💬LLMs

rl for red teaming: training models to attack and defend themselves 🤖AI Agents

castform.com·6d·Hacker News

Log in to enable infinite scrolling